Introduction to PyPore

Author: Jacob Schreiber jmschreiber91@gmail.com

PyPore is a Python package for the analysis, visualization, and storage of nanopore data for the UCSC Nanopore Group. It focuses on being extensible through object orientation, and speed optimized by writing computationally intensive code in Cython. Currently the easiest way to get it is to clone from the GitHub repository. It requires numpy and matplotlib.



In [1]:

    
%matplotlib inline
import numpy as np
import PyPore

print "numpy version: {}".format( np.__version__ )
print "PyPore version: {}".format( PyPore.__version__ )









    



numpy version: 1.8.0
PyPore version: 0.2.0

The UCSC Nanopore Group uses biological nanopores, which are single protein porins inserted into lipid bilayers contained in a salt solution. When a voltage is applied, salt molecules pass through the porin, and an ionic current can be sampled at a high frequency. Biomolecules in the solution are also pulled through the porin, and cause the ionic current to fluctuate in a sequence-specific manner. A common computational task is to try to infer properties of the sequence from the ionic current recorded.

However, before we get to that, lets see how we load up some data. Data is stored in axon binary files (abf), which each store 12.5 minutes (750 seconds) of data, with 100,000 samples per second. Loading this data is as simple as casting a filename as a file. These can be absolute or relative pathnames.

Files



In [2]:

    
from PyPore.DataTypes import *
import seaborn as sns

file = File( "14418008-s04.abf" )

plt.figure( figsize=(20, 4))
file.plot( (0,10), downsample=100, c='k', alpha=0.66 )
plt.ylim(0, 130)









    



C:\Anaconda\lib\site-packages\PyPore\DataTypes.py:673: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  current = self.current[ start*second:end*second:downsample ]
C:\Anaconda\lib\site-packages\PyPore\DataTypes.py:673: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  current = self.current[ start*second:end*second:downsample ]






    Out[2]:





(0, 130)

Open channel current is ~110 pA, and corresponds to nothing but ions passing through the nanopore. Here we've plotting the first 10 seconds of a file, without analyzing it at all. We see that the file begins at open channel current, has a few transient blockades, then a longer blockage which appears to be structured in some manner, before returning briefly to open channel current.

When an experiment is run, a mixture of enzyme-bound and enzyme-free DNA strands pass through the pore. Enzyme bound strands have a processive enzyme bound to the strand, which regulates the movement of the strand through the nanopore, slowing it down long enough for data to be recorded, and look like the event which goes from ~1.75 seconds to ~5 seconds. Enzyme free strands pass too quickly through the nanopore to have any meaningful data collected, shown by the transient blockades from the start to ~1.75 seconds.

These transient blockades are meaningless to us. We are only interested in the enzyme-bound translocations. Fortunately, PyPore has a simple event detector which can pull these out by setting an ionic current threshold, and a duration threshold. By default, events are regions of ionic current below 90 pA which are longer than 1 second, and never go below -0.5 pA.



In [3]:

    
from PyPore.parsers import *

file.parse( lambda_event_parser() )


plt.figure( figsize=(20, 4))
file.plot( (0,10) )
plt.ylim(0, 130)









    Out[3]:





(0, 130)

The one event shown is shown in cyan, while the rest is shown in grey. Lets take a look at the entire file. We need to downsample what we plot, to as the file is huge. We can pass in separate parameters for event handling and file handling. See below. The parameters shown are default, and shown only to display how you could modify the plot to your liking.



In [4]:

    
plt.figure( figsize=(20, 4))
file.plot( event_downsample=5, file_downsample=100, downsample=10, file_kwargs={ 'c':'k', 'alpha':0.66 }, 
        event_kwargs={ 'c': 'c', 'alpha':0.66 } )

There are many regions which would be considered events, except that they passed below 0. The phenomena which look like vertical lines correspond to the user reversing the applied voltage temporarily in order to clear a blockage of the pore. Often times strands will get stuck, and must be ejected as they are not completing in a normal manner. These should not be included in a list of events from the file. However, we can show what would be considered an event should we set the minimum current filter to something which is never used.

Lets look at a file stats of the file before moving on.



In [5]:

    
print "File has {} events, and has average ionic current {} (+/- {})".format( file.n, file.mean, file.std )









    



File has 32 events, and has average ionic current 41.3248274738 (+/- 26.0543447264)

Events

Now lets look at single events.



In [6]:

    
event = file.events[0]

plt.figure( figsize=(20,4) )
event.plot()

Commonly, we want to apply a low-pass filter to the data, and then find a way to segment the event into the discrete segments we see. We expect that the segments contain sequence information, and so we want to be able to isolate them. In order to do that, we want to see these segments. In order to do segmentation, we have a recursive divide and conquer algorithm, which is described in a recent pending publication and can handle arbitrarily filtered data. A Cython implementation of this algorithm is SpeedyStatSplit. We then want to plot the segmentation using a four color cycle, which goes red-blue-orange-green.

Feel free to play around with the parameters to see how it effects the segmentation, if you're reading this using IPython notebook.



In [7]:

    
event.filter()
event.parse( SpeedyStatSplit( prior_segments_per_second=10., cutoff_freq=2000., min_width=100 ) )

plt.figure( figsize=(20,4))
event.plot( color='cycle' )

It may look like some areas are oversegmented, but when you zoom in on those areas, they do appear to be different segments. Whether or not this fluctuation is due to a new sequence being read by the nanopore, or some other slight difference in ion passage is debatable. However, it is not the job of the segmenter to determine what the underlying biochemical phenomena is, simply whether or not a region of ionic current is different.

Lets look more at the segmenter. Lets try to relate the parameter "prior_segments_per_second", to the observed number of segments per second.



In [8]:

    
prior_segments_range = 10 ** np.arange( -1, 5.5, 0.5 )
n_segments = np.zeros( ( len( prior_segments_range ), file.n ) )

for j, event in enumerate( file.events[1:] ):
    event.filter( order=1, cutoff=2000 )

    for i, prior in enumerate( prior_segments_range ):
        event.parse( SpeedyStatSplit( prior_segments_per_second=prior, cutoff_freq=2000. ) )
        n_segments[i, j] = 1. * event.n / event.duration

plt.figure( figsize=(10,5) )
plt.plot( prior_segments_range, n_segments, c='c', alpha=0.66 )
plt.xscale( 'log' )
plt.yscale( 'log' )
plt.xlabel( 'Prior Segments Per Second' )
plt.ylabel( 'Observed Segments Per Second' )









    Out[8]:





<matplotlib.text.Text at 0x2e8ebe48>

It is extremely easy to go through all the events and collect some data. In this case, we segment the event using a wide variety segmenter parameters, and then collect the number of segments present.

The number of segments doesn't seem to change that much at low priors, but then will begin to grow as the prior approaches the sampling frequency. The default minimum width of a segment is 100 samples in order to remove edge effects, and so we see the observed number of segments approach $\frac{sampling\ frequency}{min\ width}$. This is to be expected, as it will begin to call every region of length equal to minimum width as its own segment, regardless of the scoring function.

A takeaway is that while getting the order of magnitude correct with the prior, one need not worry about getting it too precise.

Now, there was a cutoff freq parameter, which indicates the frequency with which the data was filtered. What happens if we filter the event, but don't mention that to the segmenter? As a note, we've currently already filtered the event object to 2kHz above, and so we shouldn't do it again. A comment is left in for what code should be run if the event had not been previously filtered.



In [9]:

    
event = file.events[0]
#event.filter( order=1, cutoff=2000 )
event.parse( SpeedyStatSplit( prior_segments_per_second=10. ) )
event.plot( color='cycle' )

We get drastic oversegmentation if we do this. It's completely unusable. This is because filtering the samples makes them no longer i.i.d., by replacing each sample with a sample influenced by the original samples neighbors. The scoring function of the segmenter assumes originally that each sample is independent of each other sample, and so needs to make a correction if filtering is used on the data.

Segments

When an event is segmented, it now is represented into a list of segments. A segment is just a container for an array of ionic current and some metadata about that current. It does not have any analysis methods. If we resegment the event taking cutoff frequency into account, and then plot the second segment (the first segment is often the downspike from open channel), we get the following.



In [10]:

    
event.parse( SpeedyStatSplit( prior_segments_per_second=10., cutoff_freq=2000. ))
segment = event.segments[1]
plt.plot( segment.current )
plt.show()

Its representation is a JSON of the metadata.



In [11]:

    
segment









    Out[11]:





{
    "std" : 0.9403569392503025,
    "end" : 0.1458,
    "name" : "Segment",
    "min" : 40.078221683229302,
    "max" : 48.026575101245172,
    "start" : 0.001,
    "duration" : 0.1448,
    "mean" : 44.994775514725987
}

Segments themselves aren't terribly interesting, but we often want their metadata. Downstream analyses, such as hidden Markov models, require tuples of properties from these segments, most often just their mean, but sometimes their standard deviation and duration as well. For this purpose, it is useful to understand segments. We can see how we might display the mean of every segment in an event.



In [12]:

    
print ' '.join( str( round( segment.mean, 2 ) ) for segment in event.segments )









    



55.21 44.99 35.15 35.88 41.08 45.26 43.41 45.37 47.67 45.71 47.59 46.31 47.65 46.35 7.23 5.64 6.84 45.81 47.62 48.91 47.54 47.3 47.64 27.73 17.61 30.31 33.81 31.43 30.09 29.86 23.63 31.57 37.15 44.31 41.32 36.9 39.93 44.49 36.9 39.13 45.16 37.5 44.51 40.64 36.82 39.3 40.97 40.41 36.71 42.32 44.63 40.02 36.68 41.94 31.47 26.83 17.76 27.25 47.17 45.03 41.1 40.52 36.02 35.21 40.37 44.68 41.37 36.76 41.05 36.97 42.42 41.59 32.39 19.85 30.2 37.78 40.52 45.68 39.74 36.34 44.4 40.7 36.23 42.87 42.44 45.0 39.26 51.36

Experiments

A common operation is to have a large set of files, and want to go through each of them and apply the same event detector and segmenter, and end up with events reduced to their metadata. The Experiment object allows you to go through and do this easily, in a memory-efficient manner. The parameters passed in are the default ones. If None is passed in as the segmenter, the event is not segmented, and if None is passed in for the filter_params, the event is not filtered. Verbose gives a summary of what is happening, and meta=True stores only the metadata, not the full ionic current array. The full verbose log is a bit much, but included just to show the summary given.



In [13]:

    
exp = Experiment( [ '14418004-s04.abf', '14418005-s04.abf' ] )
exp.parse( event_detector=lambda_event_parser( threshold=90 ), 
           segmenter=SpeedyStatSplit( prior_segments_per_second=10, cutoff_freq=2000. ),
           filter_params=(1,2000),
           verbose=True, 
           meta=False )









    



Opening 14418004-s04
	Detected 41 Events
		Event 1 has 77 segments
		Event 2 has 92 segments
		Event 3 has 107 segments
		Event 4 has 78 segments
		Event 5 has 112 segments
		Event 6 has 228 segments
		Event 7 has 122 segments
		Event 8 has 144 segments
		Event 9 has 126 segments
		Event 10 has 92 segments
		Event 11 has 21 segments
		Event 12 has 24 segments
		Event 13 has 14 segments
		Event 14 has 171 segments
		Event 15 has 80 segments
		Event 16 has 68 segments
		Event 17 has 120 segments
		Event 18 has 67 segments
		Event 19 has 100 segments
		Event 20 has 117 segments
		Event 21 has 165 segments
		Event 22 has 128 segments
		Event 23 has 153 segments
		Event 24 has 87 segments
		Event 25 has 63 segments
		Event 26 has 71 segments
		Event 27 has 52 segments
		Event 28 has 88 segments
		Event 29 has 51 segments
		Event 30 has 69 segments
		Event 31 has 81 segments
		Event 32 has 240 segments
		Event 33 has 145 segments
		Event 34 has 246 segments
		Event 35 has 92 segments
		Event 36 has 99 segments
		Event 37 has 154 segments
		Event 38 has 162 segments
		Event 39 has 68 segments
		Event 40 has 106 segments
		Event 41 has 24 segments
Opening 14418005-s04
	Detected 39 Events
		Event 1 has 90 segments
		Event 2 has 91 segments
		Event 3 has 92 segments
		Event 4 has 67 segments
		Event 5 has 218 segments
		Event 6 has 67 segments
		Event 7 has 77 segments
		Event 8 has 83 segments
		Event 9 has 95 segments
		Event 10 has 113 segments
		Event 11 has 67 segments
		Event 12 has 107 segments
		Event 13 has 85 segments
		Event 14 has 135 segments
		Event 15 has 83 segments
		Event 16 has 120 segments
		Event 17 has 93 segments
		Event 18 has 118 segments
		Event 19 has 87 segments
		Event 20 has 76 segments
		Event 21 has 17 segments
		Event 22 has 73 segments
		Event 23 has 152 segments
		Event 24 has 99 segments
		Event 25 has 117 segments
		Event 26 has 68 segments
		Event 27 has 81 segments
		Event 28 has 91 segments
		Event 29 has 286 segments
		Event 30 has 69 segments
		Event 31 has 85 segments
		Event 32 has 125 segments
		Event 33 has 79 segments
		Event 34 has 93 segments
		Event 35 has 169 segments
		Event 36 has 91 segments
		Event 37 has 122 segments
		Event 38 has 105 segments
		Event 39 has 161 segments

Hidden Markov Models

Given that events are sequencial, hidden Markov models (HMMs) are an obvious way to try to analyze this data. PyPore does not allow you to build HMMs, but support for YAHMM is built-in. YAHMM is a general HMM package for Python which is implemented in Cython for speed. The gist is that a model object represents a HMM, and has methods such as model.viterbi, model.forward, model.train, which allow for common HMM operations to be done. Potentially, someone can make their own HMMs as long as they follow the same format.

However, PyPore has some meta-operations, which allow you to build complicated models from individual units. See more here. Lets quickly build a reference profile from hand-curated data, and see how well it works on this data. We can use the PyPore helper function ModularProfileModel to quickly build a HMM from a reference.



In [14]:

    
from PyPore.hmm import *
from epigenetics import *

model = ModularProfileModel( Phi29GlobalAlignmentModule, build_profile(), "Epigenetics-54", insert=UniformDistribution(0,90) )

There are three ways to apply HMMs to nanopore data. A common task is to get the Viterbi path of the sequence through the model. This can be done two ways, if you just want to use the mean.



In [15]:

    
# Way 1
logp, path = event.apply_hmm( model, algorithm='viterbi' )
print "logp: {}".format( logp )
print ", ".join( state.name for idx, state in path )









    



logp: -286.90374246
Epigenetics-54-start, D:0, I:0, D:1, 2-start, M:2, 2-end, b2e2, mC:4-start, M:mC:4, mC:4-end, mC:5-start, M:mC:5, mC:5-end, mC:6-start, M:mC:6, M:mC:6, M:mC:6, mC:6-end, mC:7-start, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, mC:7-end, I:mC:7, I:mC:7, I:mC:7, mC:7-start, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, mC:7-end, mC:8-start, M:mC:8, mC:8-end, mC:9-start, M:mC:9, mC:9-end, mC:10-start, M:mC:10, mC:10-end, mC:11-start, MO:mC:11, MO:mC:11, MO:mC:11, MO:mC:11, MO:mC:11, mC:11-end, I:mC:11, mC:11-start, M:mC:11, mC:11-end, D:mC:12, 12-start, M:12, 12-end, b12s5, bhmC:12e5, hmC:12-start, M:hmC:12, M:hmC:12, hmC:12-end, 12-start, M:12, 12-end, 13-start, M:13, 13-end, 14-start, M:14, 14-end, 15-start, M:15, 15-end, 16-start, M:16, 16-end, 17-start, M:17, 17-end, D:18, D:19, 20-start, M:20, 20-end, 21-start, M:21, 21-end, 22-start, M:22, 22-end, 23-start, M:23, 23-end, b22e5, 22-start, M:22, M:22, M:22, 22-end, 23-start, M:23, 23-end, b22e5, 22-start, M:22, 22-end, b21e5, 21-start, M:21, 21-end, 22-start, M:22, 22-end, 23-start, M:23, 23-end, b23e2, hmC:25-start, M:hmC:25, hmC:25-end, hmC:26-start, M:hmC:26, hmC:26-end, I:hmC:26, hmC:27-start, M:hmC:27, hmC:27-end, hmC:28-start, M:hmC:28, hmC:28-end, hmC:29-start, M:hmC:29, hmC:29-end, hmC:30-start, M:hmC:30, hmC:30-end, hmC:31-start, M:hmC:31, M:hmC:31, hmC:31-end, hmC:32-start, M:hmC:32, hmC:32-end, hmC:33-start, M:hmC:33, hmC:33-end, I:hmC:33, 33-start, M:33, 33-end, 34-start, M:34, 34-end, 35-start, M:35, 35-end, b34e5, 34-start, M:34, 34-end, 35-start, M:35, 35-end, b35e2, T:37-start, M:T:37, M:T:37, T:37-end, T:38-start, M:T:38, T:38-end, T:39-start, M:T:39, T:39-end, T:40-start, M:T:40, T:40-end, T:41-start, M:T:41, T:41-end, T:42-start, M:T:42, T:42-end, 42-start, M:42, 42-end, 43-start, M:43, 43-end, 44-start, M:44, 44-end, 45-start, M:45, 45-end, 46-start, M:46, 46-end, 47-start, M:47, 47-end, 48-start, M:48, 48-end, 49-start, M:49, 49-end, D:50, 51-start, M:51, 51-end, 52-start, M:52, 52-end, I:52, D:53, b53e2, Epigenetics-54-end



In [16]:

    
# Way 2
logp, path = model.viterbi( [ segment.mean for segment in event.segments ] )
print "logp: {}".format( logp )
print ", ".join( state.name for idx, state in path )









    



logp: -286.90374246
Epigenetics-54-start, D:0, I:0, D:1, 2-start, M:2, 2-end, b2e2, mC:4-start, M:mC:4, mC:4-end, mC:5-start, M:mC:5, mC:5-end, mC:6-start, M:mC:6, M:mC:6, M:mC:6, mC:6-end, mC:7-start, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, mC:7-end, I:mC:7, I:mC:7, I:mC:7, mC:7-start, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, MO:mC:7, mC:7-end, mC:8-start, M:mC:8, mC:8-end, mC:9-start, M:mC:9, mC:9-end, mC:10-start, M:mC:10, mC:10-end, mC:11-start, MO:mC:11, MO:mC:11, MO:mC:11, MO:mC:11, MO:mC:11, mC:11-end, I:mC:11, mC:11-start, M:mC:11, mC:11-end, D:mC:12, 12-start, M:12, 12-end, b12s5, bhmC:12e5, hmC:12-start, M:hmC:12, M:hmC:12, hmC:12-end, 12-start, M:12, 12-end, 13-start, M:13, 13-end, 14-start, M:14, 14-end, 15-start, M:15, 15-end, 16-start, M:16, 16-end, 17-start, M:17, 17-end, D:18, D:19, 20-start, M:20, 20-end, 21-start, M:21, 21-end, 22-start, M:22, 22-end, 23-start, M:23, 23-end, b22e5, 22-start, M:22, M:22, M:22, 22-end, 23-start, M:23, 23-end, b22e5, 22-start, M:22, 22-end, b21e5, 21-start, M:21, 21-end, 22-start, M:22, 22-end, 23-start, M:23, 23-end, b23e2, hmC:25-start, M:hmC:25, hmC:25-end, hmC:26-start, M:hmC:26, hmC:26-end, I:hmC:26, hmC:27-start, M:hmC:27, hmC:27-end, hmC:28-start, M:hmC:28, hmC:28-end, hmC:29-start, M:hmC:29, hmC:29-end, hmC:30-start, M:hmC:30, hmC:30-end, hmC:31-start, M:hmC:31, M:hmC:31, hmC:31-end, hmC:32-start, M:hmC:32, hmC:32-end, hmC:33-start, M:hmC:33, hmC:33-end, I:hmC:33, 33-start, M:33, 33-end, 34-start, M:34, 34-end, 35-start, M:35, 35-end, b34e5, 34-start, M:34, 34-end, 35-start, M:35, 35-end, b35e2, T:37-start, M:T:37, M:T:37, T:37-end, T:38-start, M:T:38, T:38-end, T:39-start, M:T:39, T:39-end, T:40-start, M:T:40, T:40-end, T:41-start, M:T:41, T:41-end, T:42-start, M:T:42, T:42-end, 42-start, M:42, 42-end, 43-start, M:43, 43-end, 44-start, M:44, 44-end, 45-start, M:45, 45-end, 46-start, M:46, 46-end, 47-start, M:47, 47-end, 48-start, M:48, 48-end, 49-start, M:49, 49-end, D:50, 51-start, M:51, 51-end, 52-start, M:52, 52-end, I:52, D:53, b53e2, Epigenetics-54-end

If you're interested in using more than just the mean, you have to use the second way, and that would be something like logp, path = model.viterbi( [ (s.mean, s.std, s.duration) for s in event.segments ] ), using whichever properties you want to.

Another common thing is to calculate the sum-of-all-paths log probability of the sequence given the model.



In [17]:

    
print event.apply_hmm( model, algorithm='log_probability' )









    



-277.250584166

We can now replicate the analysis done in the profile HMM paper. Again, see this notebook for more background on what is going on.

Now lets pull the data and reduce it down to the events as a list of means. Lets use the Experiment object described before. Since we're using a lot of data, lets not see the actual log (though you may want to do this in a Pythons script). Lets also turn meta on, since we only care about the means, and the files are gigabytes in size. We'll also import some helper functions from another script, found here. This shows how simple it is to run complicated HMMs through nanopore data.



In [19]:

    
from epigenetics import *

files = [ '14418004-s04.abf', '14418005-s04.abf', '14418006-s04.abf',
          '14418007-s04.abf', '14418008-s04.abf', '14418009-s04.abf', 
          '14418010-s04.abf', '14418011-s04.abf', '14418012-s04.abf', 
          '14418013-s04.abf', '14418014-s04.abf', '14418015-s04.abf', 
          '14418016-s04.abf' ]

model = EpigeneticsModel( build_profile(), "Epigenetics-54" )

exp = Experiment( files )
exp.parse( meta=True, verbose=False )

We can confirm the number of files which were analyzed with exp.n, and access the files of metadata using exp.files.



In [20]:

    
exp.n









    Out[20]:





13

Lets reduce the data down to the list of means, to run through the classification function.



In [21]:

    
events = reduce( list.__add__, [ [ [ segment.mean for segment in event.segments ] for event in file.events ] for file in exp.files ] )

We should now tain on 70 percent of the data, and test on the remaining 30 percent. However, we don't want to train on the complete crap. We will run the classification function on all the events using the untrained model, and only use events which have a filter score of at least 0.1 for Baum-Welch training.



In [22]:

    
training_events, testing_events = events[ : int(len(events)*.7) ], events[ int(len(events)*.7) : ]

print "Training Events: {}, Testing Events: {}".format( len(training_events), len(testing_events))

training_data = analyze_events( training_events, model )
training_events = [ event for score, event in zip( training_data['Filter Score'], training_events ) if score > 0.1 ]

model.train( training_events, max_iterations=10, use_pseudocount=True )









    



Training Events: 289, Testing Events: 124
Training improvement: 42.0635655411
Training improvement: 8.652931912
Training improvement: 3.20551887287
Training improvement: 0.300015540077
Training improvement: -0.857821880937
Total Training Improvement:  53.3642099851






    Out[22]:





53.36420998511775

Now lets classify the training data using the trained model.



In [23]:

    
testing_data = analyze_events( testing_events, model )

We can now show how the filter score relates to the mean cumulative soft call (MCSC), where MCSC says 'what is the accuracy if I use events this good or better?'



In [24]:

    
testing_data = testing_data.sort( 'Filter Score' )[::-1]
testing_data['MCSC'] = [ sum( testing_data['Soft Call'][:i] ) / i for i in xrange( 1, len(testing_data['Soft Call'])+1 ) ]

plt.plot( testing_data['Filter Score'], testing_data['MCSC'] )
plt.ylabel( 'MCSC' )
plt.xlabel( 'Filter Score' )









    Out[24]:





<matplotlib.axes.AxesSubplot at 0x60ddc320>

A repository of IPython notebooks for the UCSC nanopore lab can be found here, and involve using PyPore and YAHMM in various ways to solve computational tasks.

We've seen how to use PyPore to analyze nanopore data extracted from .abf files. This includes pulling the raw data, detecting events, segmenting them, and using a profile HMM in order to extract the useful segments from the data. This pipeline is entirely automated, except for the profile construction.

Package Information

PyPore is freely available under the MIT license at https://github.com/jmschrei/PyPore. Feel free to contribute or comment!

Installing PyPore is as easy as pip install pythonic-porin. Dependencies are numpy, Cython, matplotlib, and optional dependencies are PyQt4 and MySQLdb. A good way to get these dependencies is the Anaconda Scientific Python Distribution, or the unofficial list of windows binaries.

If you have questions or comments, my email is jmschreiber91@gmail.com.